- AgentDB Learning Plugins
- What This Skill Does
- Provides access to 9 reinforcement learning algorithms via AgentDB's plugin system. Create, train, and deploy learning plugins for autonomous agents that improve through experience. Includes offline RL (Decision Transformer), value-based learning (Q-Learning), policy gradients (Actor-Critic), and advanced techniques.
- Performance
- Train models 10-100x faster with WASM-accelerated neural inference. Prerequisites Node.js 18+ AgentDB v1.0.7+ (via agentic-flow) Basic understanding of reinforcement learning (recommended) Quick Start with CLI Create Learning Plugin
Interactive wizard
npx agentdb@latest create-plugin
Use specific template
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
Preview without creating
npx agentdb@latest create-plugin -t q-learning --dry-run
Custom output directory
npx agentdb@latest create-plugin -t actor-critic -o ./plugins List Available Templates
Show all plugin templates
npx agentdb@latest list-templates
Available templates:
- decision-transformer (sequence modeling RL - recommended)
- q-learning (value-based learning)
- sarsa (on-policy TD learning)
- actor-critic (policy gradient with baseline)
- curiosity-driven (exploration-based)
Manage Plugins
List installed plugins
npx agentdb@latest list-plugins
Get plugin information
npx agentdb@latest plugin-info my-agent
Shows: algorithm, configuration, training status
- Quick Start with API
- import
- {
- createAgentDBAdapter
- }
- from
- 'agentic-flow/reasoningbank'
- ;
- // Initialize with learning enabled
- const
- adapter
- =
- await
- createAgentDBAdapter
- (
- {
- dbPath
- :
- '.agentdb/learning.db'
- ,
- enableLearning
- :
- true
- ,
- // Enable learning plugins
- enableReasoning
- :
- true
- ,
- cacheSize
- :
- 1000
- ,
- }
- )
- ;
- // Store training experience
- await
- adapter
- .
- insertPattern
- (
- {
- id
- :
- ''
- ,
- type
- :
- 'experience'
- ,
- domain
- :
- 'game-playing'
- ,
- pattern_data
- :
- JSON
- .
- stringify
- (
- {
- embedding
- :
- await
- computeEmbedding
- (
- 'state-action-reward'
- )
- ,
- pattern
- :
- {
- state
- :
- [
- 0.1
- ,
- 0.2
- ,
- 0.3
- ]
- ,
- action
- :
- 2
- ,
- reward
- :
- 1.0
- ,
- next_state
- :
- [
- 0.15
- ,
- 0.25
- ,
- 0.35
- ]
- ,
- done
- :
- false
- }
- }
- )
- ,
- confidence
- :
- 0.9
- ,
- usage_count
- :
- 1
- ,
- success_count
- :
- 1
- ,
- created_at
- :
- Date
- .
- now
- (
- )
- ,
- last_used
- :
- Date
- .
- now
- (
- )
- ,
- }
- )
- ;
- // Train learning model
- const
- metrics
- =
- await
- adapter
- .
- train
- (
- {
- epochs
- :
- 50
- ,
- batchSize
- :
- 32
- ,
- }
- )
- ;
- console
- .
- log
- (
- 'Training Loss:'
- ,
- metrics
- .
- loss
- )
- ;
- console
- .
- log
- (
- 'Duration:'
- ,
- metrics
- .
- duration
- ,
- 'ms'
- )
- ;
- Available Learning Algorithms (9 Total)
- 1. Decision Transformer (Recommended)
- Type
-
- Offline Reinforcement Learning
- Best For
-
- Learning from logged experiences, imitation learning
- Strengths
-
- No online interaction needed, stable training
- npx agentdb@latest create-plugin
- -t
- decision-transformer
- -n
- dt-agent
- Use Cases
- :
- Learn from historical data
- Imitation learning from expert demonstrations
- Safe learning without environment interaction
- Sequence modeling tasks
- Configuration
- :
- {
- "algorithm"
- :
- "decision-transformer"
- ,
- "model_size"
- :
- "base"
- ,
- "context_length"
- :
- 20
- ,
- "embed_dim"
- :
- 128
- ,
- "n_heads"
- :
- 8
- ,
- "n_layers"
- :
- 6
- }
- 2. Q-Learning
- Type
-
- Value-Based RL (Off-Policy)
- Best For
-
- Discrete action spaces, sample efficiency
- Strengths
-
- Proven, simple, works well for small/medium problems
- npx agentdb@latest create-plugin
- -t
- q-learning
- -n
- q-agent
- Use Cases
- :
- Grid worlds, board games
- Navigation tasks
- Resource allocation
- Discrete decision-making
- Configuration
- :
- {
- "algorithm"
- :
- "q-learning"
- ,
- "learning_rate"
- :
- 0.001
- ,
- "gamma"
- :
- 0.99
- ,
- "epsilon"
- :
- 0.1
- ,
- "epsilon_decay"
- :
- 0.995
- }
- 3. SARSA
- Type
-
- Value-Based RL (On-Policy)
- Best For
-
- Safe exploration, risk-sensitive tasks
- Strengths
-
- More conservative than Q-Learning, better for safety
- npx agentdb@latest create-plugin
- -t
- sarsa
- -n
- sarsa-agent
- Use Cases
- :
- Safety-critical applications
- Risk-sensitive decision-making
- Online learning with exploration
- Configuration
- :
- {
- "algorithm"
- :
- "sarsa"
- ,
- "learning_rate"
- :
- 0.001
- ,
- "gamma"
- :
- 0.99
- ,
- "epsilon"
- :
- 0.1
- }
- 4. Actor-Critic
- Type
-
- Policy Gradient with Value Baseline
- Best For
-
- Continuous actions, variance reduction
- Strengths
-
- Stable, works for continuous/discrete actions
- npx agentdb@latest create-plugin
- -t
- actor-critic
- -n
- ac-agent
- Use Cases
- :
- Continuous control (robotics, simulations)
- Complex action spaces
- Multi-agent coordination
- Configuration
- :
- {
- "algorithm"
- :
- "actor-critic"
- ,
- "actor_lr"
- :
- 0.001
- ,
- "critic_lr"
- :
- 0.002
- ,
- "gamma"
- :
- 0.99
- ,
- "entropy_coef"
- :
- 0.01
- }
- 5. Active Learning
- Type
-
- Query-Based Learning
- Best For
-
- Label-efficient learning, human-in-the-loop
- Strengths
-
- Minimizes labeling cost, focuses on uncertain samples
- Use Cases
- :
- Human feedback incorporation
- Label-efficient training
- Uncertainty sampling
- Annotation cost reduction
- 6. Adversarial Training
- Type
-
- Robustness Enhancement
- Best For
-
- Safety, robustness to perturbations
- Strengths
-
- Improves model robustness, adversarial defense
- Use Cases
- :
- Security applications
- Robust decision-making
- Adversarial defense
- Safety testing
- 7. Curriculum Learning
- Type
-
- Progressive Difficulty Training
- Best For
-
- Complex tasks, faster convergence
- Strengths
-
- Stable learning, faster convergence on hard tasks
- Use Cases
- :
- Complex multi-stage tasks
- Hard exploration problems
- Skill composition
- Transfer learning
- 8. Federated Learning
- Type
-
- Distributed Learning
- Best For
-
- Privacy, distributed data
- Strengths
-
- Privacy-preserving, scalable
- Use Cases
- :
- Multi-agent systems
- Privacy-sensitive data
- Distributed training
- Collaborative learning
- 9. Multi-Task Learning
- Type
-
- Transfer Learning
- Best For
-
- Related tasks, knowledge sharing
- Strengths
- Faster learning on new tasks, better generalization
Use Cases
:
Task families
Transfer learning
Domain adaptation
Meta-learning
Training Workflow
1. Collect Experiences
// Store experiences during agent execution
for
(
let
i
=
0
;
i
<
numEpisodes
;
i
++
)
{
const
episode
=
runEpisode
(
)
;
for
(
const
step
of
episode
.
steps
)
{
await
adapter
.
insertPattern
(
{
id
:
''
,
type
:
'experience'
,
domain
:
'task-domain'
,
pattern_data
:
JSON
.
stringify
(
{
embedding
:
await
computeEmbedding
(
JSON
.
stringify
(
step
)
)
,
pattern
:
{
state
:
step
.
state
,
action
:
step
.
action
,
reward
:
step
.
reward
,
next_state
:
step
.
next_state
,
done
:
step
.
done
}
}
)
,
confidence
:
step
.
reward
0 ? 0.9 : 0.5 , usage_count : 1 , success_count : step . reward
0 ? 1 : 0 , created_at : Date . now ( ) , last_used : Date . now ( ) , } ) ; } } 2. Train Model // Train on collected experiences const trainingMetrics = await adapter . train ( { epochs : 100 , batchSize : 64 , learningRate : 0.001 , validationSplit : 0.2 , } ) ; console . log ( 'Training Metrics:' , trainingMetrics ) ; // { // loss: 0.023, // valLoss: 0.028, // duration: 1523, // epochs: 100 // } 3. Evaluate Performance // Retrieve similar successful experiences const testQuery = await computeEmbedding ( JSON . stringify ( testState ) ) ; const result = await adapter . retrieveWithReasoning ( testQuery , { domain : 'task-domain' , k : 10 , synthesizeContext : true , } ) ; // Evaluate action quality const suggestedAction = result . memories [ 0 ] . pattern . action ; const confidence = result . memories [ 0 ] . similarity ; console . log ( 'Suggested Action:' , suggestedAction ) ; console . log ( 'Confidence:' , confidence ) ; Advanced Training Techniques Experience Replay // Store experiences in buffer const replayBuffer = [ ] ; // Sample random batch for training const batch = sampleRandomBatch ( replayBuffer , batchSize : 32 ) ; // Train on batch await adapter . train ( { data : batch , epochs : 1 , batchSize : 32 , } ) ; Prioritized Experience Replay // Store experiences with priority (TD error) await adapter . insertPattern ( { // ... standard fields confidence : tdError , // Use TD error as confidence/priority // ... } ) ; // Retrieve high-priority experiences const highPriority = await adapter . retrieveWithReasoning ( queryEmbedding , { domain : 'task-domain' , k : 32 , minConfidence : 0.7 , // Only high TD-error experiences } ) ; Multi-Agent Training // Collect experiences from multiple agents for ( const agent of agents ) { const experience = await agent . step ( ) ; await adapter . insertPattern ( { // ... store experience with agent ID domain :
multi-agent/ ${ agent . id }, } ) ; } // Train shared model await adapter . train ( { epochs : 50 , batchSize : 64 , } ) ; Performance Optimization Batch Training // Collect batch of experiences const experiences = collectBatch ( size : 1000 ) ; // Batch insert (500x faster) for ( const exp of experiences ) { await adapter . insertPattern ( { / ... / } ) ; } // Train on batch await adapter . train ( { epochs : 10 , batchSize : 128 , // Larger batch for efficiency } ) ; Incremental Learning // Train incrementally as new data arrives setInterval ( async ( ) => { const newExperiences = getNewExperiences ( ) ; if ( newExperiences . length100 ) { await adapter . train ( { epochs : 5 , batchSize : 32 , } ) ; } } , 60000 ) ; // Every minute Integration with Reasoning Agents Combine learning with reasoning for better performance: // Train learning model await adapter . train ( { epochs : 50 , batchSize : 32 } ) ; // Use reasoning agents for inference const result = await adapter . retrieveWithReasoning ( queryEmbedding , { domain : 'decision-making' , k : 10 , useMMR : true , // Diverse experiences synthesizeContext : true , // Rich context optimizeMemory : true , // Consolidate patterns } ) ; // Make decision based on learned experiences + reasoning const decision = result . context . suggestedAction ; const confidence = result . memories [ 0 ] . similarity ; CLI Operations
Create plugin
npx agentdb@latest create-plugin -t decision-transformer -n my-plugin
List plugins
npx agentdb@latest list-plugins
Get plugin info
npx agentdb@latest plugin-info my-plugin
List templates
npx agentdb@latest list-templates Troubleshooting Issue: Training not converging // Reduce learning rate await adapter . train ( { epochs : 100 , batchSize : 32 , learningRate : 0.0001 , // Lower learning rate } ) ; Issue: Overfitting // Use validation split await adapter . train ( { epochs : 50 , batchSize : 64 , validationSplit : 0.2 , // 20% validation } ) ; // Enable memory optimization await adapter . retrieveWithReasoning ( queryEmbedding , { optimizeMemory : true , // Consolidate, reduce overfitting } ) ; Issue: Slow training
Enable quantization for faster inference
Use binary quantization (32x faster)
- Learn More
- Algorithm Papers
-
- See docs/algorithms/ for detailed papers
- GitHub
- :
- https://github.com/ruvnet/agentic-flow/tree/main/packages/agentdb
- MCP Integration
- :
- npx agentdb@latest mcp
- Website
- :
- https://agentdb.ruv.io
- Category
-
- Machine Learning / Reinforcement Learning
- Difficulty
-
- Intermediate to Advanced
- Estimated Time
- 30-60 minutes